Overview

Dataset statistics

Number of variables12
Number of observations1586614
Missing cells68148
Missing cells (%)0.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory145.3 MiB
Average record size in memory96.0 B

Variable types

Numeric9
Categorical3

Warnings

brewery_name has a high cardinality: 5742 distinct values High cardinality
review_profilename has a high cardinality: 33387 distinct values High cardinality
beer_name has a high cardinality: 56857 distinct values High cardinality
beer_abv has 67785 (4.3%) missing values Missing

Reproduction

Analysis started2021-02-24 10:21:12.796559
Analysis finished2021-02-24 10:23:26.664204
Duration2 minutes and 13.87 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

brewery_id
Real number (ℝ≥0)

Distinct5840
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3130.099202
Minimum1
Maximum28003
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2021-02-24T21:23:26.767206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile30
Q1143
median429
Q32372
95-th percentile16866
Maximum28003
Range28002
Interquartile range (IQR)2229

Descriptive statistics

Standard deviation5578.103987
Coefficient of variation (CV)1.782085368
Kurtosis3.408354127
Mean3130.099202
Median Absolute Deviation (MAD)366
Skewness2.083747568
Sum4966259215
Variance31115244.1
MonotocityNot monotonic
2021-02-24T21:23:26.911203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3539444
 
2.5%
1009933839
 
2.1%
14733066
 
2.1%
14028751
 
1.8%
28725191
 
1.6%
13224083
 
1.5%
119920004
 
1.3%
34519479
 
1.2%
22016837
 
1.1%
3016107
 
1.0%
Other values (5830)1329813
83.8%
ValueCountFrequency (%)
11357
 
0.1%
240
 
< 0.1%
35357
0.3%
47321
0.5%
5728
 
< 0.1%
ValueCountFrequency (%)
280032
< 0.1%
280001
 
< 0.1%
279841
 
< 0.1%
279803
< 0.1%
279451
 
< 0.1%

brewery_name
Categorical

HIGH CARDINALITY

Distinct5742
Distinct (%)0.4%
Missing15
Missing (%)< 0.1%
Memory size12.1 MiB
Boston Beer Company (Samuel Adams)
 
39444
Dogfish Head Brewery
 
33839
Stone Brewing Co.
 
33066
Sierra Nevada Brewing Co.
 
28751
Bell's Brewery, Inc.
 
25191
Other values (5737)
1426308 

Length

Max length66
Median length23
Mean length23.61032246
Min length3

Characters and Unicode

Total characters37460114
Distinct characters132
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique672 ?
Unique (%)< 0.1%

Sample

1st rowVecchio Birraio
2nd rowVecchio Birraio
3rd rowVecchio Birraio
4th rowVecchio Birraio
5th rowCaldera Brewing Company
ValueCountFrequency (%)
Boston Beer Company (Samuel Adams)39444
 
2.5%
Dogfish Head Brewery33839
 
2.1%
Stone Brewing Co.33066
 
2.1%
Sierra Nevada Brewing Co.28751
 
1.8%
Bell's Brewery, Inc.25191
 
1.6%
Rogue Ales24083
 
1.5%
Founders Brewing Company20004
 
1.3%
Victory Brewing Company19479
 
1.2%
Lagunitas Brewing Company16837
 
1.1%
Avery Brewing Company16107
 
1.0%
Other values (5732)1329798
83.8%
2021-02-24T21:23:27.263236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
brewing752184
 
13.6%
company541207
 
9.8%
brewery306096
 
5.6%
co263714
 
4.8%
179145
 
3.2%
beer95720
 
1.7%
brouwerij71874
 
1.3%
samuel49736
 
0.9%
de47007
 
0.9%
ltd45786
 
0.8%
Other values (6761)3161958
57.3%

Most occurring characters

ValueCountFrequency (%)
e3939975
 
10.5%
3931498
 
10.5%
r3198978
 
8.5%
n2341435
 
6.3%
a2041858
 
5.5%
o2009863
 
5.4%
i1909456
 
5.1%
B1714210
 
4.6%
w1306627
 
3.5%
s1172755
 
3.1%
Other values (122)13893459
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter26990365
72.1%
Uppercase Letter5508479
 
14.7%
Space Separator3931502
 
10.5%
Other Punctuation809694
 
2.2%
Open Punctuation69272
 
0.2%
Close Punctuation69272
 
0.2%
Dash Punctuation56826
 
0.2%
Decimal Number22848
 
0.1%
Control855
 
< 0.1%
Final Punctuation720
 
< 0.1%
Other values (2)281
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e3939975
14.6%
r3198978
11.9%
n2341435
 
8.7%
a2041858
 
7.6%
o2009863
 
7.4%
i1909456
 
7.1%
w1306627
 
4.8%
s1172755
 
4.3%
g1135712
 
4.2%
y1133858
 
4.2%
Other values (47)6799848
25.2%
ValueCountFrequency (%)
B1714210
31.1%
C989535
18.0%
S417502
 
7.6%
A304760
 
5.5%
H193868
 
3.5%
L191839
 
3.5%
D178746
 
3.2%
T171416
 
3.1%
G164219
 
3.0%
P159544
 
2.9%
Other values (27)1022840
18.6%
ValueCountFrequency (%)
.469257
58.0%
&118017
 
14.6%
'99610
 
12.3%
/63031
 
7.8%
,51456
 
6.4%
#4052
 
0.5%
;4000
 
0.5%
?144
 
< 0.1%
"81
 
< 0.1%
@46
 
< 0.1%
ValueCountFrequency (%)
26832
29.9%
14501
19.7%
33349
14.7%
81795
 
7.9%
51673
 
7.3%
71471
 
6.4%
61423
 
6.2%
4727
 
3.2%
0567
 
2.5%
9510
 
2.2%
ValueCountFrequency (%)
š640
74.9%
Ž180
 
21.1%
’24
 
2.8%
Š5
 
0.6%
ž2
 
0.2%
“2
 
0.2%
”2
 
0.2%
ValueCountFrequency (%)
3931498
> 99.9%
 4
 
< 0.1%
ValueCountFrequency (%)
(69246
> 99.9%
[26
 
< 0.1%
ValueCountFrequency (%)
)69246
> 99.9%
]26
 
< 0.1%
ValueCountFrequency (%)
´7
63.6%
`4
36.4%
ValueCountFrequency (%)
-56826
100.0%
ValueCountFrequency (%)
720
100.0%
ValueCountFrequency (%)
+270
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin32498844
86.8%
Common4961270
 
13.2%

Most frequent character per script

ValueCountFrequency (%)
e3939975
 
12.1%
r3198978
 
9.8%
n2341435
 
7.2%
a2041858
 
6.3%
o2009863
 
6.2%
i1909456
 
5.9%
B1714210
 
5.3%
w1306627
 
4.0%
s1172755
 
3.6%
g1135712
 
3.5%
Other values (84)11727975
36.1%
ValueCountFrequency (%)
3931498
79.2%
.469257
 
9.5%
&118017
 
2.4%
'99610
 
2.0%
(69246
 
1.4%
)69246
 
1.4%
/63031
 
1.3%
-56826
 
1.1%
,51456
 
1.0%
26832
 
0.1%
Other values (28)26251
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII37376747
99.8%
None82647
 
0.2%
Punctuation720
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
e3939975
 
10.5%
3931498
 
10.5%
r3198978
 
8.6%
n2341435
 
6.3%
a2041858
 
5.5%
o2009863
 
5.4%
i1909456
 
5.1%
B1714210
 
4.6%
w1306627
 
3.5%
s1172755
 
3.1%
Other values (70)13810092
36.9%
ValueCountFrequency (%)
ä24216
29.3%
ö15340
18.6%
è8006
 
9.7%
ø6231
 
7.5%
é5753
 
7.0%
í4427
 
5.4%
ü4010
 
4.9%
Ø3438
 
4.2%
ô3009
 
3.6%
ý1538
 
1.9%
Other values (41)6679
 
8.1%
ValueCountFrequency (%)
720
100.0%

review_time
Real number (ℝ≥0)

Distinct1577960
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1224089280
Minimum840672001
Maximum1326285348
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2021-02-24T21:23:28.235202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum840672001
5-th percentile1071431292
Q11173224188
median1239202882
Q31288568405
95-th percentile1318389924
Maximum1326285348
Range485613347
Interquartile range (IQR)115344217

Descriptive statistics

Standard deviation76544274.54
Coefficient of variation (CV)0.06253161088
Kurtosis-0.3136982976
Mean1224089280
Median Absolute Deviation (MAD)54219357.5
Skewness-0.7352727768
Sum1.942157189 × 1015
Variance5.859025965 × 1015
MonotocityNot monotonic
2021-02-24T21:23:28.378237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
110177280021
 
< 0.1%
9263808018
 
< 0.1%
10311012008
 
< 0.1%
9808128017
 
< 0.1%
8970912017
 
< 0.1%
10221120017
 
< 0.1%
9048672016
 
< 0.1%
9029664016
 
< 0.1%
9330336016
 
< 0.1%
9262944015
 
< 0.1%
Other values (1577950)1586533
> 99.9%
ValueCountFrequency (%)
8406720011
< 0.1%
8843904011
< 0.1%
8846496011
< 0.1%
8853408011
< 0.1%
8854272011
< 0.1%
ValueCountFrequency (%)
13262853481
< 0.1%
13262849701
< 0.1%
13262766561
< 0.1%
13262750491
< 0.1%
13262744541
< 0.1%

review_overall
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.815580853
Minimum0
Maximum5
Zeros7
Zeros (%)< 0.1%
Memory size12.1 MiB
2021-02-24T21:23:28.507210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.5
Q13.5
median4
Q34.5
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7206218681
Coefficient of variation (CV)0.1888629532
Kurtosis1.631038958
Mean3.815580853
Median Absolute Deviation (MAD)0.5
Skewness-1.023968713
Sum6053854
Variance0.5192958767
MonotocityNot monotonic
2021-02-24T21:23:28.644203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
4582764
36.7%
4.5324385
20.4%
3.5301817
19.0%
3165644
 
10.4%
591320
 
5.8%
2.558523
 
3.7%
238225
 
2.4%
1.512975
 
0.8%
110954
 
0.7%
07
 
< 0.1%
ValueCountFrequency (%)
07
 
< 0.1%
110954
 
0.7%
1.512975
 
0.8%
238225
2.4%
2.558523
3.7%
ValueCountFrequency (%)
591320
 
5.8%
4.5324385
20.4%
4582764
36.7%
3.5301817
19.0%
3165644
 
10.4%

review_aroma
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.735636078
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2021-02-24T21:23:28.788205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.5
Q13.5
median4
Q34
95-th percentile4.5
Maximum5
Range4
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.6976167288
Coefficient of variation (CV)0.1867464374
Kurtosis1.145196752
Mean3.735636078
Median Absolute Deviation (MAD)0.5
Skewness-0.838530526
Sum5927012.5
Variance0.4866691003
MonotocityNot monotonic
2021-02-24T21:23:28.919203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
4557383
35.1%
3.5365312
23.0%
4.5271450
17.1%
3200030
 
12.6%
2.566359
 
4.2%
564117
 
4.0%
242566
 
2.7%
1.512524
 
0.8%
16873
 
0.4%
ValueCountFrequency (%)
16873
 
0.4%
1.512524
 
0.8%
242566
 
2.7%
2.566359
 
4.2%
3200030
12.6%
ValueCountFrequency (%)
564117
 
4.0%
4.5271450
17.1%
4557383
35.1%
3.5365312
23.0%
3200030
 
12.6%

review_appearance
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.841641697
Minimum0
Maximum5
Zeros7
Zeros (%)< 0.1%
Memory size12.1 MiB
2021-02-24T21:23:29.042207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q13.5
median4
Q34
95-th percentile4.5
Maximum5
Range5
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.6160927689
Coefficient of variation (CV)0.160372262
Kurtosis1.738866541
Mean3.841641697
Median Absolute Deviation (MAD)0.5
Skewness-0.9024199172
Sum6095202.5
Variance0.3795702999
MonotocityNot monotonic
2021-02-24T21:23:29.161203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
4674186
42.5%
3.5318529
20.1%
4.5288108
18.2%
3166009
 
10.5%
565398
 
4.1%
2.539493
 
2.5%
225414
 
1.6%
1.56147
 
0.4%
13323
 
0.2%
07
 
< 0.1%
ValueCountFrequency (%)
07
 
< 0.1%
13323
 
0.2%
1.56147
 
0.4%
225414
1.6%
2.539493
2.5%
ValueCountFrequency (%)
565398
 
4.1%
4.5288108
18.2%
4674186
42.5%
3.5318529
20.1%
3166009
 
10.5%

review_profilename
Categorical

HIGH CARDINALITY

Distinct33387
Distinct (%)2.1%
Missing348
Missing (%)< 0.1%
Memory size12.1 MiB
northyorksammy
 
5817
BuckeyeNation
 
4661
mikesgroove
 
4617
Thorpe429
 
3518
womencantsail
 
3497
Other values (33382)
1564156 

Length

Max length16
Median length9
Mean length8.962746475
Min length3

Characters and Unicode

Total characters14217300
Distinct characters63
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10443 ?
Unique (%)0.7%

Sample

1st rowstcules
2nd rowstcules
3rd rowstcules
4th rowstcules
5th rowjohnmichaelsen
ValueCountFrequency (%)
northyorksammy5817
 
0.4%
BuckeyeNation4661
 
0.3%
mikesgroove4617
 
0.3%
Thorpe4293518
 
0.2%
womencantsail3497
 
0.2%
NeroFiddled3488
 
0.2%
ChainGangGuy3471
 
0.2%
brentk563357
 
0.2%
Phyl21ca3179
 
0.2%
WesWes3168
 
0.2%
Other values (33377)1547493
97.5%
2021-02-24T21:23:29.597201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
northyorksammy5817
 
0.4%
buckeyenation4661
 
0.3%
mikesgroove4617
 
0.3%
thorpe4293518
 
0.2%
womencantsail3497
 
0.2%
nerofiddled3488
 
0.2%
chaingangguy3471
 
0.2%
brentk563357
 
0.2%
phyl21ca3179
 
0.2%
weswes3168
 
0.2%
Other values (33377)1547493
97.6%

Most occurring characters

ValueCountFrequency (%)
e1456336
 
10.2%
a1059087
 
7.4%
r1013118
 
7.1%
o866721
 
6.1%
n764015
 
5.4%
i718329
 
5.1%
t631230
 
4.4%
s598952
 
4.2%
l570628
 
4.0%
d445265
 
3.1%
Other values (53)6093619
42.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter11922011
83.9%
Uppercase Letter1348320
 
9.5%
Decimal Number946848
 
6.7%
Other Punctuation121
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e1456336
 
12.2%
a1059087
 
8.9%
r1013118
 
8.5%
o866721
 
7.3%
n764015
 
6.4%
i718329
 
6.0%
t631230
 
5.3%
s598952
 
5.0%
l570628
 
4.8%
d445265
 
3.7%
Other values (16)3798330
31.9%
ValueCountFrequency (%)
B166179
 
12.3%
S99905
 
7.4%
D92244
 
6.8%
M91717
 
6.8%
T81794
 
6.1%
C79278
 
5.9%
G72979
 
5.4%
J64353
 
4.8%
R64118
 
4.8%
A63561
 
4.7%
Other values (16)472192
35.0%
ValueCountFrequency (%)
1170728
18.0%
0126244
13.3%
2117738
12.4%
7100599
10.6%
387239
9.2%
879308
8.4%
974969
7.9%
567625
 
7.1%
467397
 
7.1%
655001
 
5.8%
ValueCountFrequency (%)
.121
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin13270331
93.3%
Common946969
 
6.7%

Most frequent character per script

ValueCountFrequency (%)
e1456336
 
11.0%
a1059087
 
8.0%
r1013118
 
7.6%
o866721
 
6.5%
n764015
 
5.8%
i718329
 
5.4%
t631230
 
4.8%
s598952
 
4.5%
l570628
 
4.3%
d445265
 
3.4%
Other values (42)5146650
38.8%
ValueCountFrequency (%)
1170728
18.0%
0126244
13.3%
2117738
12.4%
7100599
10.6%
387239
9.2%
879308
8.4%
974969
7.9%
567625
 
7.1%
467397
 
7.1%
655001
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII14217300
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1456336
 
10.2%
a1059087
 
7.4%
r1013118
 
7.1%
o866721
 
6.1%
n764015
 
5.4%
i718329
 
5.1%
t631230
 
4.4%
s598952
 
4.2%
l570628
 
4.0%
d445265
 
3.1%
Other values (53)6093619
42.9%

review_palate
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.743701367
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2021-02-24T21:23:29.704233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.5
Q13.5
median4
Q34
95-th percentile4.5
Maximum5
Range4
Interquartile range (IQR)0.5

Descriptive statistics

Standard deviation0.6822183634
Coefficient of variation (CV)0.1822309785
Kurtosis1.303397287
Mean3.743701367
Median Absolute Deviation (MAD)0.5
Skewness-0.8691499712
Sum5939809
Variance0.4654218953
MonotocityNot monotonic
2021-02-24T21:23:29.796244image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
4606711
38.2%
3.5338585
21.3%
4.5253102
16.0%
3206932
 
13.0%
2.562842
 
4.0%
562190
 
3.9%
238333
 
2.4%
1.511045
 
0.7%
16874
 
0.4%
ValueCountFrequency (%)
16874
 
0.4%
1.511045
 
0.7%
238333
 
2.4%
2.562842
 
4.0%
3206932
13.0%
ValueCountFrequency (%)
562190
 
3.9%
4.5253102
16.0%
4606711
38.2%
3.5338585
21.3%
3206932
 
13.0%

review_taste
Real number (ℝ≥0)

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.792860456
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2021-02-24T21:23:29.906243image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.5
Q13.5
median4
Q34.5
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7319696099
Coefficient of variation (CV)0.1929861692
Kurtosis1.341669306
Mean3.792860456
Median Absolute Deviation (MAD)0.5
Skewness-0.9734324438
Sum6017805.5
Variance0.5357795098
MonotocityNot monotonic
2021-02-24T21:23:29.999206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
4541429
34.1%
4.5336162
21.2%
3.5324541
20.5%
3166860
 
10.5%
583977
 
5.3%
2.566534
 
4.2%
241992
 
2.6%
1.515128
 
1.0%
19991
 
0.6%
ValueCountFrequency (%)
19991
 
0.6%
1.515128
 
1.0%
241992
 
2.6%
2.566534
 
4.2%
3166860
10.5%
ValueCountFrequency (%)
583977
 
5.3%
4.5336162
21.2%
4541429
34.1%
3.5324541
20.5%
3166860
 
10.5%

beer_name
Categorical

HIGH CARDINALITY

Distinct56857
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size12.1 MiB
90 Minute IPA
 
3290
India Pale Ale
 
3130
Old Rasputin Russian Imperial Stout
 
3111
Sierra Nevada Celebration Ale
 
3000
Two Hearted Ale
 
2728
Other values (56852)
1571355 

Length

Max length75
Median length19
Mean length20.45317513
Min length1

Characters and Unicode

Total characters32451294
Distinct characters190
Distinct categories19 ?
Distinct scripts5 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18908 ?
Unique (%)1.2%

Sample

1st rowSausa Weizen
2nd rowRed Moon
3rd rowBlack Horse Black Beer
4th rowSausa Pils
5th rowCauldron DIPA
ValueCountFrequency (%)
90 Minute IPA3290
 
0.2%
India Pale Ale3130
 
0.2%
Old Rasputin Russian Imperial Stout3111
 
0.2%
Sierra Nevada Celebration Ale3000
 
0.2%
Two Hearted Ale2728
 
0.2%
Arrogant Bastard Ale2704
 
0.2%
Stone Ruination IPA2704
 
0.2%
Sierra Nevada Pale Ale2587
 
0.2%
Stone IPA (India Pale Ale)2575
 
0.2%
Pliny The Elder2527
 
0.2%
Other values (56847)1558258
98.2%
2021-02-24T21:23:30.454206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ale390057
 
7.6%
stout139632
 
2.7%
ipa112739
 
2.2%
pale94197
 
1.8%
imperial75872
 
1.5%
porter65143
 
1.3%
63104
 
1.2%
lager52754
 
1.0%
beer43677
 
0.9%
samuel42948
 
0.8%
Other values (28738)4050017
78.9%

Most occurring characters

ValueCountFrequency (%)
3545473
 
10.9%
e3527081
 
10.9%
a2055389
 
6.3%
r2035112
 
6.3%
l1817251
 
5.6%
o1602495
 
4.9%
i1550559
 
4.8%
t1540208
 
4.7%
n1393651
 
4.3%
s1087452
 
3.4%
Other values (180)12296623
37.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter22576298
69.6%
Uppercase Letter5373685
 
16.6%
Space Separator3545492
 
10.9%
Decimal Number337678
 
1.0%
Other Punctuation294036
 
0.9%
Close Punctuation111939
 
0.3%
Open Punctuation111937
 
0.3%
Dash Punctuation95525
 
0.3%
Other Symbol2455
 
< 0.1%
Control959
 
< 0.1%
Other values (9)1290
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e3527081
15.6%
a2055389
9.1%
r2035112
9.0%
l1817251
 
8.0%
o1602495
 
7.1%
i1550559
 
6.9%
t1540208
 
6.8%
n1393651
 
6.2%
s1087452
 
4.8%
u892827
 
4.0%
Other values (58)5074273
22.5%
ValueCountFrequency (%)
A771599
14.4%
S663609
12.3%
B568115
 
10.6%
P469612
 
8.7%
I292613
 
5.4%
D253528
 
4.7%
H246781
 
4.6%
C243799
 
4.5%
W200696
 
3.7%
L198630
 
3.7%
Other values (40)1464703
27.3%
ValueCountFrequency (%)
'171493
58.3%
.63173
 
21.5%
#14162
 
4.8%
&12314
 
4.2%
/11932
 
4.1%
"6479
 
2.2%
,4149
 
1.4%
:3078
 
1.0%
!2870
 
1.0%
%2537
 
0.9%
Other values (5)1849
 
0.6%
ValueCountFrequency (%)
229
86.1%
º26
 
9.8%
2
 
0.8%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
Other values (2)2
 
0.8%
ValueCountFrequency (%)
0100405
29.7%
163830
18.9%
249056
14.5%
822544
 
6.7%
521438
 
6.3%
919902
 
5.9%
318320
 
5.4%
415604
 
4.6%
614188
 
4.2%
712391
 
3.7%
ValueCountFrequency (%)
Ž362
37.7%
’316
33.0%
ž154
16.1%
š98
 
10.2%
Š18
 
1.9%
–6
 
0.6%
œ3
 
0.3%
‘2
 
0.2%
ValueCountFrequency (%)
3
50.0%
2
33.3%
«1
 
16.7%
ValueCountFrequency (%)
231
98.3%
3
 
1.3%
»1
 
0.4%
ValueCountFrequency (%)
+539
98.9%
=4
 
0.7%
~2
 
0.4%
ValueCountFrequency (%)
³138
79.3%
½35
 
20.1%
²1
 
0.6%
ValueCountFrequency (%)
3545473
> 99.9%
 19
 
< 0.1%
ValueCountFrequency (%)
(111407
99.5%
[530
 
0.5%
ValueCountFrequency (%)
)111409
99.5%
]530
 
0.5%
ValueCountFrequency (%)
-95519
> 99.9%
6
 
< 0.1%
ValueCountFrequency (%)
´10
90.9%
^1
 
9.1%
ValueCountFrequency (%)
ʼ7
87.5%
1
 
12.5%
ValueCountFrequency (%)
°2455
100.0%
ValueCountFrequency (%)
$44
100.0%
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin27950006
86.1%
Common4501045
 
13.9%
Han234
 
< 0.1%
Katakana6
 
< 0.1%
Greek3
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
e3527081
 
12.6%
a2055389
 
7.4%
r2035112
 
7.3%
l1817251
 
6.5%
o1602495
 
5.7%
i1550559
 
5.5%
t1540208
 
5.5%
n1393651
 
5.0%
s1087452
 
3.9%
u892827
 
3.2%
Other values (108)10447981
37.4%
ValueCountFrequency (%)
3545473
78.8%
'171493
 
3.8%
)111409
 
2.5%
(111407
 
2.5%
0100405
 
2.2%
-95519
 
2.1%
163830
 
1.4%
.63173
 
1.4%
249056
 
1.1%
822544
 
0.5%
Other values (50)166736
 
3.7%
ValueCountFrequency (%)
229
97.9%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
ValueCountFrequency (%)
2
33.3%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
ValueCountFrequency (%)
Ω3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII32366038
99.7%
None84762
 
0.3%
Punctuation246
 
< 0.1%
CJK234
 
< 0.1%
Katakana7
 
< 0.1%
Modifier Letters7
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
3545473
 
11.0%
e3527081
 
10.9%
a2055389
 
6.4%
r2035112
 
6.3%
l1817251
 
5.6%
o1602495
 
5.0%
i1550559
 
4.8%
t1540208
 
4.8%
n1393651
 
4.3%
s1087452
 
3.4%
Other values (76)12211367
37.7%
ValueCountFrequency (%)
é19550
23.1%
ö14822
17.5%
ä11881
14.0%
è7519
 
8.9%
ü6147
 
7.3%
ë3216
 
3.8%
ô3055
 
3.6%
°2455
 
2.9%
É2306
 
2.7%
ê1870
 
2.2%
Other values (75)11941
14.1%
ValueCountFrequency (%)
231
93.9%
6
 
2.4%
3
 
1.2%
3
 
1.2%
2
 
0.8%
1
 
0.4%
ValueCountFrequency (%)
229
97.9%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
1
 
0.4%
ValueCountFrequency (%)
2
28.6%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
ValueCountFrequency (%)
ʼ7
100.0%

beer_abv
Real number (ℝ≥0)

MISSING

Distinct530
Distinct (%)< 0.1%
Missing67785
Missing (%)4.3%
Infinite0
Infinite (%)0.0%
Mean7.042386753
Minimum0.01
Maximum57.7
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2021-02-24T21:23:30.602211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile4.5
Q15.2
median6.5
Q38.5
95-th percentile11
Maximum57.7
Range57.69
Interquartile range (IQR)3.3

Descriptive statistics

Standard deviation2.322525993
Coefficient of variation (CV)0.3297924516
Kurtosis6.961811545
Mean7.042386753
Median Absolute Deviation (MAD)1.5
Skewness1.543406148
Sum10696181.23
Variance5.394126987
MonotocityNot monotonic
2021-02-24T21:23:30.718237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5109144
 
6.9%
867744
 
4.3%
665383
 
4.1%
759460
 
3.7%
959183
 
3.7%
5.559010
 
3.7%
1054780
 
3.5%
6.548369
 
3.0%
5.243268
 
2.7%
7.539978
 
2.5%
Other values (520)912510
57.5%
(Missing)67785
 
4.3%
ValueCountFrequency (%)
0.015
 
< 0.1%
0.0517
< 0.1%
0.081
 
< 0.1%
0.111
< 0.1%
0.253
 
< 0.1%
ValueCountFrequency (%)
57.71
 
< 0.1%
432
 
< 0.1%
4176
< 0.1%
39.443
 
< 0.1%
397
 
< 0.1%

beer_beerid
Real number (ℝ≥0)

Distinct66055
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21712.79428
Minimum3
Maximum77317
Zeros0
Zeros (%)0.0%
Memory size12.1 MiB
2021-02-24T21:23:30.882235image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile213
Q11717
median13906
Q339441
95-th percentile62653
Maximum77317
Range77314
Interquartile range (IQR)37724

Descriptive statistics

Standard deviation21818.336
Coefficient of variation (CV)1.004860808
Kurtosis-0.8339342225
Mean21712.79428
Median Absolute Deviation (MAD)13217
Skewness0.6893969312
Sum3.444982338 × 1010
Variance476039785.7
MonotocityNot monotonic
2021-02-24T21:23:31.022201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20933290
 
0.2%
4123111
 
0.2%
19043000
 
0.2%
10932728
 
0.2%
922704
 
0.2%
40832704
 
0.2%
2762587
 
0.2%
882575
 
0.2%
79712527
 
0.2%
117572502
 
0.2%
Other values (66045)1558886
98.3%
ValueCountFrequency (%)
33
 
< 0.1%
410
 
< 0.1%
5424
< 0.1%
6877
0.1%
7659
< 0.1%
ValueCountFrequency (%)
773171
< 0.1%
773161
< 0.1%
773151
< 0.1%
773141
< 0.1%
773131
< 0.1%

Interactions

2021-02-24T21:22:24.481585image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:25.320673image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:26.067583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:26.843583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:27.582587image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:28.268639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:28.950633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:29.635638image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:30.326627image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:31.070618image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:31.823621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:32.716797image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:33.502998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:34.207999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:34.904000image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:35.650997image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:36.386999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:37.114044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:37.788001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:38.536022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:39.234999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:39.910002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:40.610033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:41.325001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:42.046001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:42.786003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:43.527001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:44.277036image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:44.961998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:45.653028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:46.400032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:47.169084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:47.907040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:48.643030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:49.372033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:50.050028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:50.734066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:51.413029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:52.124065image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:52.847035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:53.573063image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:54.318031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:55.100033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:55.806063image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:56.582033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:57.282031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:58.011053image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:58.783034image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:22:59.514066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:00.262063image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:01.011028image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:01.700052image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:02.484030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:03.243032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:03.992032image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:04.709030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:05.444031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:06.150062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:06.843029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:07.478070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:08.179062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:08.819063image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:09.517030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:10.230114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:10.954237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:11.695203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:12.491205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:13.190236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:14.094204image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:14.970203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:15.890205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-24T21:23:16.922205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-24T21:23:31.390203image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-24T21:23:31.562201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-24T21:23:31.745202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-24T21:23:31.917204image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-02-24T21:23:19.039236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-24T21:23:21.006202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-02-24T21:23:24.533237image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-02-24T21:23:25.137202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

brewery_idbrewery_namereview_timereview_overallreview_aromareview_appearancereview_profilenamereview_palatereview_tastebeer_namebeer_abvbeer_beerid
010325Vecchio Birraio12348178231.52.02.5stcules1.51.5Sausa Weizen5.047986
110325Vecchio Birraio12359150973.02.53.0stcules3.03.0Red Moon6.248213
210325Vecchio Birraio12359166043.02.53.0stcules3.03.0Black Horse Black Beer6.548215
310325Vecchio Birraio12347251453.03.03.5stcules2.53.0Sausa Pils5.047969
41075Caldera Brewing Company12937352064.04.54.0johnmichaelsen4.04.5Cauldron DIPA7.764883
51075Caldera Brewing Company13255246593.03.53.5oline733.03.5Caldera Ginger Beer4.752159
61075Caldera Brewing Company13189911153.53.53.5Reidrover4.04.0Caldera Ginger Beer4.752159
71075Caldera Brewing Company13062760183.02.53.5alpinebryant2.03.5Caldera Ginger Beer4.752159
81075Caldera Brewing Company12904545034.03.03.5LordAdmNelson3.54.0Caldera Ginger Beer4.752159
91075Caldera Brewing Company12856329244.53.55.0augustgarage4.04.0Caldera Ginger Beer4.752159

Last rows

brewery_idbrewery_namereview_timereview_overallreview_aromareview_appearancereview_profilenamereview_palatereview_tastebeer_namebeer_abvbeer_beerid
158660414359The Defiant Brewing Company12888902064.04.54.5njmoons3.53.5The Horseman's Ale5.233061
158660514359The Defiant Brewing Company11632911435.05.05.0NyackNicky5.05.0The Horseman's Ale5.233061
158660614359The Defiant Brewing Company11628718085.04.54.0blitheringidiot5.05.0The Horseman's Ale5.233061
158660714359The Defiant Brewing Company11628656405.05.04.5PopeDX5.04.5The Horseman's Ale5.233061
158660814359The Defiant Brewing Company11626858563.54.04.0treehugger020103.53.0The Horseman's Ale5.233061
158660914359The Defiant Brewing Company11626848925.04.03.5maddogruss4.04.0The Horseman's Ale5.233061
158661014359The Defiant Brewing Company11610485664.05.02.5yelterdow2.04.0The Horseman's Ale5.233061
158661114359The Defiant Brewing Company11607025134.53.53.0TongoRad3.54.0The Horseman's Ale5.233061
158661214359The Defiant Brewing Company11600230444.04.54.5dherling4.54.5The Horseman's Ale5.233061
158661314359The Defiant Brewing Company11600053195.04.54.5cbl24.54.5The Horseman's Ale5.233061